Real-time audio-visual voice activity detection for speech recognition in noisy environments

نویسندگان

  • Carlos Toshinori Ishi
  • Miki Sato
  • Norihiro Hagita
  • Shihong Lao
چکیده

Voice activity detection (VAD) is one of the most critical issues on performance degradation of speech recognition in noisy environment applications. A real-time VAD was developed by using face parameters (eye and lip contours) as a front-end for the traditional speech and noise (audio) GMMbased method. Speech recognition performance of the audiovisual VAD is shown to be comparable with audio-only VAD, for a shopping mall background noise. Advantages and limitations of introducing the visual information are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots

Automatic Speech Recognition (ASR) which plays an important role in human-robot interaction should be noise-robust because robots are expected to work in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve the robustness in such environments. This paper proposes two-layered AV integration for ASR which applies AV integration to Voice Activity Detection (VAD) and...

متن کامل

A robust audio-visual speech recognition using audio-visual voice activity detection

This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence ...

متن کامل

Audio-visual speech recognition system for a robot

Automatic Speech Recognition (ASR) for a robot should be robust for noises because a robot works in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve its robustness in such environments. This paper proposes AV integration for an ASR system for a robot which applies AV integration to Voice Activity Detection (VAD) and speech decoding. In VAD, we apply AV-integr...

متن کامل

Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator

In this letter, a robust voice activity detection (VAD) algorithm is presented. This proposed VAD algorithm makes use of the perceptual wavelet-packet transform and the Teager energy operator to compute a robust parameter called voice activity shape for VAD. The main advantage of this algorithm is that the preset threshold values or a priori knowledge of the SNR usually needed in conventional V...

متن کامل

Improved voice activity detection combining noise reduction and subband divergence measures

Currently, new trends in wireless communications are demanding reliable human-machine interaction in real-life environments. However, there are obstacles inhibiting automatic speech recognition systems (ASR) working in noisy environments. The main difficulty is the degradation suffered by ASR systems due to a mismatch between training and test conditions. This paper shows an improved voice acti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010